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Abstract 

Background: The Amazon River is by far the world's largest in terms of volume and area, generating a fluvial 
export that accounts for about a fifth of riverine input into the world's oceans. Marine microbial communities of 
the Western Tropical North Atlantic Ocean are strongly affected by the terrestrial materials carried by the Amazon 
plume, including dissolved (DOC) and particulate organic carbon (POC) and inorganic nutrients, with impacts on 
primary productivity and carbon sequestration. 

Results: We inventoried genes and transcripts at six stations in the Amazon River plume during June 2010. At 
each station, internal standard-spiked metagenomes, non-selective metatranscriptomes, and poly(A)-selective 
metatranscriptomes were obtained in duplicate for two discrete size fractions (0.2 to 2.0 um and 2.0 to 156 urn) 
using 150 x 150 paired-end lllumina sequencing. Following quality control, the dataset contained 360 million 
reads of approximately 200 bp average size from Bacteria, Archaea, Eukarya, and viruses. Bacterial metagenomes 
and metatranscriptomes were dominated by Synechococcus, Prochlorococcus, SAR1 1, SAR1 16, and SAR86, with high 
contributions from SAR324 and Verrucomicrobia at some stations. Diatoms, green picophytoplankton, dinoflagellates, 
haptophytes, and copepods dominated the eukaryotic genes and transcripts. Gene expression ratios differed by 
station, size fraction, and microbial group, with transcription levels varying over three orders of magnitude across 
taxa and environments. 

Conclusions: This first comprehensive inventory of microbial genes and transcripts, benchmarked with internal 
standards for full quantitation, is generating novel insights into biogeochemical processes of the Amazon plume 
and improving prediction of climate change impacts on the marine biosphere. 
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Background and oceanic diatoms with endosymbiotic diazotrophs 

The Amazon River runs nearly 6,500 km across the take advantage of the riverine nutrient supplements 

South American continent before emptying into the and enhance carbon export from the upper ocean to 

Western Tropical North Atlantic Ocean; in terms of deeper waters via sinking particles [3,4]. Heterotrophic 

both volume and watershed area it is the world's largest bacteria also remineralize organic nutrients in the plume, 

riverine system [1]. The river carries a significant load further fueling primary production and increasing the flux 

of terrestrially- derived nutrients to the ocean, and this of organic material to deep water. 

has global consequences on marine primary productivity We inventoried the microbial genes and transcripts at 

and carbon sequestration [2,3]. Productive phytoplankton six stations in the Amazon River plume aboard the R/V 

blooms harboring cyanobacteria, coastal diatom species, Knorr between 22 May and 25 June, 2010 (Figure 1) using 

lllumina sequencing with 150 x 150 bp overlapping 

* Correspondence: mmoran@uga.edu paired-end reads. Metagenomic and metatranscriptomic 
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work (that is, % of metagenome and % of metatranscrip- type of gene or transcript imposes a change in the percent 
tome), but this approach is problematic for dynamic contribution of the others. By incorporating internal 
communities because a change in the abundance of one standards, we are able to assess meta-omics datasets 
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within an absolute framework that facilitates comparisons 
of communities sampled at different times and places 
in the environment. In the Amazon plume sequence 
libraries, known copy numbers of internal standards 
were added at the initiation of sample processing and 
consisted of genomic DNA from an exotic bacterium 
for the metagenomes (Thermus thermophilus HB8) and 
artificial mRNAs and poly(A)-tailed mRNAs for the 
metatranscriptomes; these standards were identified, 
counted, and removed from the natural sequences during 
quality control steps. 

For each station, metagenomes and non-selective meta- 
transcriptomes were each obtained in duplicate for two 
discrete size fractions (0.2 to 2.0 um and 2.0 to 156 urn), 
while poly(A)-selective metatranscriptomes were obtained 
in duplicate only for the 2.0 to 156 um size fraction (to 
increase coverage of the eukaryotic community), resulting 
in a total of 60 datasets (6 stations x 5 data types x 2 
replicates) (Table 1). The data collection consisted of 
360 million reads following quality control (removal of 
poor quality reads, removal of rRNAs from metatran- 
scriptomes, removal of internal standards, and joining 
of overlapping 150 bp paired ends) and provides an 
unprecedented view of the metabolic functions of the 
Bacteria, Archaea, and Eukarya mediating carbon and 
nutrient cycling in the Amazon River plume. 

Methods 

Detailed sample collection and processing methodology 
can be found in Additional file 1. Sample sites in the 
Amazon River plume were chosen to represent a range of 
salinity, nutrient concentrations, and microbial communi- 
ties (Additional file 2). Microbial cells were collected by 
filtration and preserved in RNAlater (Applied Biosystems, 
Austin, TX, USA). During sample processing, internal 
standards were added to each sample prior to cell lysis. 
Samples collected for non-selective metatranscriptomics 
were processed by extracting total RNA, removing residual 



DNA, depleting rRNA, linearly amplifying the remaining 
transcripts, and making double-stranded cDNA for library 
preparation and sequencing. Poly(A)-selective metatran- 
scriptome samples were processed similarly except that 
poly( A) -tailed mRNAs were selectively isolated, eliminating 
the need for rRNA depletion steps. Metagenomic samples 
were processed by extracting DNA and removing residual 
proteins and RNA. Following sample processing, cDNA 
or DNA was sheared and libraries were constructed for 
paired-end sequencing (150 x 150) using either the Genome 
Analyzer IIx, HiSeq 2000, MiSeq, or HiSeq 2500 platform 
(Illumina Inc., San Diego, CA). 

From 60 samples, we obtained 8.21 x 10 raw sequences 
containing 1.23 x 10 11 nt. Following sequence quality 
control, 3.59 x 10 8 reads with a mean length of 195 bp 
were obtained. Internal standards were quantified and 
removed, along with any remaining rRNA sequences. 
Remaining reads were annotated against the RefSeq 
Protein database or a custom marine database using RAP- 
Search2 [5], and abundance per liter was calculated based 
on internal standard recovery [6] (Additional file 2). 

Biological and chemical data measured concurrently 
with sample collection provides environmental context 
for sequence data. These metadata include temperature, 
salinity, oxygen concentration, irradiance, chlorophyll 
concentration, nutrient concentrations, and bacterial 
abundance and production (Additional file 2). Datasets 
describing the phytoplankton communities and other 
features of the June 2010 plume ecosystem have been 
previously published [1,4,7,8]. 

Quality assurance 

The She-ra program [9] was used to join the paired-end 
Illumina reads using the default parameters and a quality 
metric score of 0.5. Seqtrim [10] was used to trim the 
joined reads using the default parameters. rRNA and 
internal standard sequences were identified in the meta- 
transcriptomes using a Blastn search against a custom 



Table 1 Number and types of libraries and reads obtained in the Amazon Continuum Project, June 2010, R/V Knorr 



Metagenomes 



Non-selective metatranscriptomes Poly(A)-selective metatranscriptomes 



Data type 

# Stations sampled 

# Size fractions sampled 

# Replicates 

# Samples 

# Raw reads 

# Joined reads post QC 
Average joined read length (bp) 

# rRNA reads 

# Potential protein-encoding reads 



Total community DNA 
6 

2 
2 
24 

3.68 x 10 8 
9.50 X 10' 
205 

9.44 X 10' 



Total community mRNA 

6 

2 

2 

24 

8.12 X 10 s 
1 .62 X 1 0 8 
190 

9.53 X 10 7 
6.52 X 1 0 7 



Eukaryotic community mRNA a 



2 
12 

4.61 X 10 s 
1.01 X 10 s 
185 

2.34 X 1 0 5 
9.86 x 10 7 



a The selective metatranscriptomes captured poly(A)-tailed transcripts and are therefore systematically biased against transcripts from eukaryotic organelles. 
#, number of; QC, quality control. 
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Table 2 Reference genome bins garnering the most metagenomic reads, organized by station and domain 
(top 10 Bacteria, 4 Eukarya, 2 Archaea, and 2 viruses) 

Domain Taxon Genes L -1 Domain Taxon Genes L 



Station 10 



Bacteria 


Synechococcus sp. CB0205 


1.46 x 


10 12 


Eukarya 


Thalassiosira oceanica CCMP1005 


1.26 x 


10" 


Bacteria 


SAR86 E 


4.65 X 


10 11 


Eukarya 


Micromonas sp. RCC299 


8.27 x 


1Q io 


Bacteria 


SAR86 D 


2.55 X 


10 11 


Eukarya 


Tetrahymena thermophila SB210 


2.41 x 


10 io 


Bacteria 


Alphaproteobacterium HIMB5 


2.32 x 


10 11 


Eukarya 


Strombidinopsis sp. SopsisLIS201 1 


1.67 x 


10 io 


Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


2.22 X 


10 11 










Bacteria 


Cand. Pelagibacter ubique 


1.54 X 


10 11 


Archaea 


Nitrosopumilus maritimus SCM 1 


1.73 X 


10 10 


Bacteria 


SAR86 C 


1.39 X 


10 11 


Archaea 


Cand. Nitrosopumilus koreensis AR1 


1.02 X 


10 10 


Bacteria 


Gammaproteobacterium HIMB55 


1.33 X 


10 11 










Bacteria 


Synechococcus sp. CB0101 


1.19 x 


10 11 


Virus 


Synechococcus phage S-RSM4 


1.74 X 


10" 


Bacteria 


Gammaproteobacterium HIMB30 


1.15 x 


10 11 


Virus 


Synechococcus phage S-SKS1 


1.74 X 


10" 


Station 3 
















Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


2.79 X 


10 11 


Eukarya 


Micromonas sp. RCC299 


2.28 X 


10 10 


Bacteria 


Alphaproteobacterium HIMB5 


2.02 X 


10 11 


Eukarya 


Tetrahymena thermophila SB210 


4.93 X 


10 9 


Bacteria 


SAR86 D 


1.64 X 


10 11 


Eukarya 


Alexandrium tamarense CCMP1771 


3.71 X 


10 9 


Bacteria 


SAR86 E 


1.33 X 


10 11 


Eukarya 


Thalassiosira oceanica CCMP1005 


3.49 X 


10 9 


Bacteria 


Cand. Pelagibacter ubique 


1.17 X 


10 11 










Bacteria 


Alphaproteobacterium HIMB59 


9.29 X 


10 10 


Archaea 


Nitrosopumilus maritimus SCM 1 


3.01 X 


10 9 


Bacteria 


SAR86 C 


8.58 X 


10 10 


Archaea 


Cand. Nitrosoarchaeum limnia 


2.31 X 


10 9 


Bacteria 


Synechococcus sp. WH 8109 


7.33 X 


10 10 










Bacteria 


Cand. Pelagibacter ubique HTCG1062 


6.37 X 


10 10 


Virus 


Synechococcus phage S-RSM4 


6.70 X 


10 io 


Bacteria 


SAR324 JCVI-SC AAA005 


5.06 X 


10 10 


Virus 


Synechococcus phage S-SKS1 


2.57 X 


10 io 


Station 2 
















Bacteria 


Coraliomargarita akajimensis DSM 45221 


3.31 x 


10 12 


Eukarya 


Phaeocystis antarctica 


1.51 X 


10 12 


Bacteria 


Cand. Puniceispirillum marinum IMCC1322 


7.46 X 


10 11 


Eukarya 


Phytophthora sojae 


1.02 X 


10 12 


Bacteria 


Gammaproteobacterium HIMB55 


6.13 x 


10 11 


Eukarya 


Emiliania huxleyi 


9.44 X 


10" 


Bacteria 


Synechococcus sp. WH 8109 


6.07 X 


10 11 


Eukarya 


Aplanochytrium kerguelense 


7.60 X 


10" 


Bacteria 


SAR116 HIMB100 


5.95 X 


10 11 










Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


5.09 X 


10 11 


Archaea 


Cand. Nitrosopumilus salaria 


1.54 X 


10 io 


Bacteria 


SAR324 JCVI-SC AAA005 


4.61 X 


10 11 


Archaea 


Methanomassiliicoccus sp. M x 1 -Issoire 


5.89 X 


10 9 


Bacteria 


Gammaproteobacterium HTCC2207 


3.91 X 


10 11 










Bacteria 


Verrucomicrobiae DG1235 


3.39 x 


10 11 


Virus 


Synechococcus phage S-RIP1 


9.55 X 


10 s 


Bacteria 


Prochlorococcus marinus str. AS9601 


3.16 X 


10 11 


Virus 


Phaeocystis globosa virus 


6.20 X 


10 8 


Station 23 
















Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


1.36 X 


10 12 


Eukarya 


Tetrahymena thermophila SB210 


2.96 X 


10 io 


Bacteria 


Alphaproteobacterium HIMB5 


9.43 X 


10 11 


Eukarya 


Protocruzia adherens Boccale 


2.84 X 


10 10 


Bacteria 


SAR86 D 


9.31 x 


10 11 


Eukarya 


Strombidinopsis sp. SopsisLIS201 1 


2.82 X 


10 10 


Bacteria 


Alphaproteobacterium HIMB59 


7.03 x 


10 11 


Eukarya 


Pseudo-nitzschia multiseries 


1.79 X 


10 10 


Bacteria 


SAR86 E 


6.95 X 


10 11 










Bacteria 


Cand. Pelagibacter ubique 


5.17 X 


10 11 


Archaea 


Methanosarcina acetivorans C2A 


1.60 X 


10 9 


Bacteria 


SAR86 C 


4.69 X 


10 11 


Archaea 


Methanosarcina barker! str. Fusaro 


1.37 X 


10 9 


Bacteria 


Cand. Pelagibacter ubique HTCC1062 


2.74 X 


10 11 










Bacteria 


SAR324 JCVI-SC AAA005 


2.33 X 


10 11 


Virus 


Phaeocystis globosa virus 


1.01 X 10 11 
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Table 2 Reference genome bins garnering the most metagenomic reads, organized by station and domain 



(top 10 Bacteria, 4 Eukarya, 2 Archaea, and 2 viruses) (Continued) 


Bacteria 


Alphaproteobacterium HIMB1 14 


2.31 x 


10 11 


Virus 


Synechococcus phage S-SM2 


4.62 x 10 10 


Station 25 














Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


6.83 x 


10 11 


Eukarya 


Pyraminomonas obovata CCMP722 


8.58 X 10 9 


Bacteria 


Alphaproteobacterium HIMB5 


4.13 x 


10 11 


Eukarya 


Phaeocystis antarctica 


6.34 X 1 0 9 


Bacteria 


Alphaproteobacterium HIMB59 


2.35 X 


10 11 


Eukarya 


Thalassiosira oceanica CCMP1005 


5.67 X 10 9 


Bacteria 


Cand. Pelagibacter ubique 


2.07 X 


10 11 


Eukarya 


Volvox carter! f. nagariensis 


4.93 X 10 9 


Bacteria 


Prochlorococcus marinus str. AS9601 


1.87 X 


10 11 








Bacteria 


Prochlorococcus marinus str. MIT 9301 


1.70 X 


10 11 


Archaea 


Methanosarcina acetivorans C2A 


9.95 X 10 s 


Bacteria 


SAR86 E 


1.67 X 


10 11 


Archaea 


Methanomassiliicoccus sp. M x 1 -Issoire 


848 X 1 0 s 


Bacteria 


SAR86 D 


1.61 X 


10 11 








Bacteria 


Coraliomargarita akajimensis DSM 45221 


1.45 X 


10 11 


Virus 


Phaeocystis globosa virus 


4.57 X 10 10 


Bacteria 


Gammaproteobacterium HTCC2207 


1.29 X 


10 11 


Virus 


Synechococcus phage S-SM2 


2.36 X 10 10 


Station 27 














Bacteria 


Prochlorococcus marinus str. AS9601 


9.43 X 


10 12 


Eukarya 


Phaeocystis antarctica 


3.04 X 10 10 


Bacteria 


Prochlorococcus marinus str. MIT 9301 


8.49 X 


10 12 


Eukarya 


Tetrahymena thermophila SB210 


2.25 X 10 10 


Bacteria 


Cand. Pelagibacter sp. HTCC721 1 


5.70 X 


10 12 


Eukarya 


Alexandrium tamarense CCMP1771 


1.56 X 10 10 


Bacteria 


Prochlorococcus marinus str. MIT 9215 


4.46 X 


10 12 


Eukarya 


Monosiga brevicollis 


1.35 X 10 10 


Bacteria 


Alphaproteobacterium HIMB5 


3.96 X 


10 12 








Bacteria 


Prochlorococcus marinus str. MIT 9312 


3.13 x 


10 12 


Archaea 


Methanomassiliicoccus sp. Mxl -Issoire 


8.73 X 10 9 


Bacteria 


Cand. Pelagibacter ubique 


2.22 X 


10 12 


Archaea 


Aciduliprofundum sp. MAR08-339 


6.10 X 10 9 


Bacteria 


Prochlorococcus marinus 


1.75 X 


10 12 








Bacteria 


Cand. Pelagibacter ubique HTCC1062 


1.31 X 


10 12 


Virus 


Prochlorococcus phage P-SSM2 


6.08 X 10 11 


Bacteria 


Alphaproteobacterium HIMB59 


1.21 X 


10 12 


Virus 


Synechococcus phage S-SM2 


3.20 X 10 11 


Bacterial, archaeal, and viral reads were annotated against the NCBI RefSeq database. Eukaryotic 


reads were annotated against a custom database containing 



marine eukaryotic genomes and transcriptomes from NCBI and 112 of the Marine Microbial Eukaryote Transcriptome Sequencing Project datasets that were public 
at the time of analysis (http://marinemicroeukaryotes.org). 



database containing representative rRNA sequences 
and internal standard sequences; sequences with a bit 
score > 50 were identified as either rRNA or internal 
standards and removed from the datasets. Internal 
standards were identified in metagenomes by first per- 
forming a Blastn search (bit score cutoff > 50) against 
the T. thermophilus HB8 genome. Hits were subsequently 
queried against the RefSeq protein database using Blastx 
(bit score cutoff > 40) to identify and quantify all T. 
thermophilus HB8 protein encoding reads, and these 
reads were removed from the datasets. 

Initial findings 

Metagenomic reads from surface waters of the six Amazon 
River plume stations were assigned to bacterial, archaeal, 
eukaryotic, and viral taxa based on best hits to reference 
genomes. Among autotrophic bacteria, Synechococcus was 
the largest contributor to the metagenomes at locations 
closest to the river mouth (Stations 10, 3; approximately 
1.5 x 10 12 genes L _1 ) and was replaced by Prochlorococcus 
at more oceanic locations (Stations 25, 27) (Table 2). 
Among heterotrophic bacteria, SAR86 had the largest 



gene abundance closest to the river mouth (Station 10; 
approximately 8.6 x 10 11 genes IT 1 ). SAR11 clade 
members (HTCC7211, HIMB5) were also abundant 
here, and became the dominant contributor of hetero- 
trophic bacterial genes at more oceanic stations (up to 
5.7 x 10 12 genes IT 1 ) (Table 2). Genes binning to SAR324 
genomes were abundant at three stations (Station 2, 3, 
and 23; Table 2), with the Amazon plume sequences 
aligning with heterotrophic members of this group [11]. 
Station 2 had a distinctive bacterial community relative 
to the other plume stations, dominated by genes from 
Verrucomicrobia related to Coraliomargarita akajimensis 
DSM 45221 and strain DG1235 and with substantial 
contributions from SARI 16 taxa (IMCC1322, HIMB100). 
Coraliomargarita akajimensis DSM 45221 was also among 
the most abundant genome bins at Station 25 (Table 2). 

Among eukaryotic taxa, diatoms and the green alga 
Micromonas contributed the greatest number of genes at 
lower salinities, while Haptophytes (binning to Phaeocystis 
antarctica), dinoflagellates (binning to Alexandrium tamar- 
ense CCMP1771) and relatives of the green alga Pyramino- 
monas obovata CCMP722 increased in importance at more 
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saline stations (Table 2). Among Archaea, members of 
the ammonia-oxidizing genus Nitrosopumilus and related 
genera contributed the most genes at stations closest to 
the river mouth, although they were 100-fold lower in 
numbers compared to the most abundant bacterial taxa. 
There were very few archaeal genes at the outermost 
stations (Stations 25 and 27), and these binned largely 
to methanogen sequences. The viral sequences were 
dominated by cyanobacterial phages (Table 2). 

Patterns of gene and transcript abundance provided 
insights into transcriptional activity by taxon and habitat 
(that is, cells that were free-living versus those that were 
particle-associated) for the dominant bacterial groups. 
Particle-associated Verrucomicrobia (Order Puniceicoc- 
cales) maintained cellular transcript inventories of up 
to 14 transcripts/gene for particle-associated cells and 
averaged 2 transcripts/gene overall (Figure 2). In contrast, 
members of the Flavobacteria class averaged < 0.5 tran- 
scripts/gene. Particle-associated cells in each of these 
major taxa typically had more transcripts per gene copy 
than did free-living cells (averaging 2.0 versus 0.15 
transcripts/gene) (Figure 2). Abundance of transcripts 
originating from particle-associated versus free-living 
bacteria varied along the plume, with mRNAs from 
free-living cells contributing only 30 to 60% of the 
metatranscriptome in landward stations, but > 90% at 
outer plume stations. Environmental data (Additional 
file 2) indicate that Station 10 had the lowest salinity 



(22.6) and Station 27 the highest (36.0). Station 10 was 
the most strongly influenced by riverine inputs, particu- 
larly of inorganic nitrogen. 

Future directions 

The Amazon River plume is immense in scale and sensitive 
to anthropogenic forcing. This multi-omics dataset is the 
first of four high-throughput metagenomic and metatran- 
scriptomic sequence collections being produced for the 
Amazon River Continuum as part of the ANACONDAS 
and ROCA projects (http://amazoncontinuum.org). These 
projects aim to improve predictive capabilities for climate 
change impacts on the marine biosphere, focusing on the 
Amazon ecosystem, and to better our understanding of 
feedbacks on the carbon cycle. Processes in the river 
and ocean are tightly linked from physical, biological, 
and biogeochemical perspectives. Thus, the complete data 
collection will include two datasets from the Amazon 
plume (June 2010 and July 2013) and two from the 
Amazon River (Obidos to Macapa and Belem; June 
2011 and July 2013). These high-coverage, size-discrete, 
and replicated datasets are all benchmarked with in- 
ternal genomic and mRNA standards for comparative 
quantitative metagenomics and metatranscriptomics. 
Insights from these meta-omics datasets are enhancing 
predictive capabilities regarding the interplay between 
marine microbial communities, biogeochemical cycling, 
and carbon sequestration in the ocean. 
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Figure 2 Inventories of genes and transcripts for eight bacterial taxa in surface waters of the Amazon plume. Symbols represent the 
mean of duplicate analyses at six stations, color-coded by taxon and size fraction (particle-associated or free-living). Lines indicate a 1:1 ratio of 
transcripts:genes (black) or 10:1 and 1:10 ratios (gray). The purple line indicates the ratio of transcripts:genes for exponentially growing laboratory 
cultures of Escherichia coli [12,13]. Dominant bacterial groups are as follows: Oscillatoriales = Trichodesmium; Prochlorales = Prochlorococcus; 
Chroococcales = Synechococcus; Nostocales = Richelia; Puniceicoccales = Verrucomicrobia. 
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Availability of supporting data 

Sequences from June 2012 Amazon Continuum study are 
available from NCBI under accession numbers [SRP039390] 
(metagenomes), [SRP037995] (non-selective metatran- 
scrip-tomes), and [SRP039544] (poly(A) -selected meta- 
transcrip tomes). The NCBI sequences are fastq files from 
which internal standard sequences and rRNA sequences 
(metatranscriptomes only) have been removed prior to 
deposition. Sequences are also available at the Commu- 
nity Cyberinfrastructure for Advanced Microbial Ecology 
Research and Analysis (CAMERA) database under project 
number CAM_P_0001194. The CAMERA sequences are 
QC'd fasta files of joined paired-end reads, also with in- 
ternal standards and rRNA sequences (metatranscriptomes 
only) removed. Metadata accompanying the omics datasets 
are provided in Additional file 2. ANACONDAS and 
ROCA project data are also available at the BCO-DMO 
data repository (http://www.bco-dmo.org/project/2097). 

Additional files 



Additional file 1: Detailed methods. Description of metagenome and 
metatranscriptome sample processing, sequencing, and data analysis, 
including internal standard additions and analysis. 

Additional file 2: Metadata. Metadata accompanying the 
metagenomic and metatranscriptomic datasets, including sample station 
locations.environmental conditions and library sizes and statistics. 
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